Round
=================


逐元素执行四舍五入（Round to Nearest Integer）运算。

该算子对输入张量的每个元素执行就近取整，
当输入值的小数部分恰好为 0.5 时，采用“远离 0”方向取整，
其行为与 C 标准库中的 ``round`` / ``roundf`` 函数一致。

.. math::

    \text{output}_i = \operatorname{round}(\text{input}_i)


输入：
    - **input** - 输入张量的数据地址。
    - **length** - 输入张量的总元素数量。
    - **core_mask** - 核掩码。

输出：
    - **output** - 输出张量的数据地址，其大小与 ``input`` 相同。

支持平台：
    ``FT78NE``
    ``MT7004``

.. note::
    - FT78NE 支持的数据类型：fp32, fp64
    - MT7004 支持的数据类型：fp16, fp32
    - 当输入为 ``±∞`` 或 ``NaN`` 时，输出结果遵循对应平台数学库的处理规则


**共享存储版本：**

.. c:function:: void fp_round_s(float* input, float* output, int length, int core_mask)
.. c:function:: void dp_round_s(double* input, double* output, int length, int core_mask)
.. c:function:: void hp_round_s(half* input, half* output, int length, int core_mask)


**C调用示例：**

.. code-block:: c
    :linenos:
    :emphasize-lines: 12

    // FT78NE 多核示例
    #include <stdio.h>
    #include <round.h>

    int main(int argc, char* argv[]) {
        float *input  = (float *)0xA0000000;   // input 在 DDR 空间
        float *output = (float *)0xB0000000;   // output 在 DDR 空间

        int length = 4096;
        int core_mask = 0xff;

        fp_round_s(input, output, length, core_mask);
        return 0;
    }


**私有存储版本：**

.. c:function:: void fp_round_p(float* input, float* output, int length)
.. c:function:: void dp_round_p(double* input, double* output, int length)
.. c:function:: void hp_round_p(half* input, half* output, int length)


**C调用示例：**

.. code-block:: c
    :linenos:
    :emphasize-lines: 11

    // MT7004 单核示例
    #include <stdio.h>
    #include <round.h>

    int main(int argc, char* argv[]) {
        half *input  = (half *)0x10000000;   // input 在 L2 空间
        half *output = (half *)0x10010000;   // output 在 L2 空间

        int length = 1024;

        hp_round_p(input, output, length);
        return 0;
    }